Propose to refactor output normalization in several transformers #11850

tolgacangoz · 2025-07-02T16:48:29Z

(I attempted to make replacements if you don't mind :)

This PR will be activated when the SkyReels-V2 models' integration PR is merged into main.

Replace FP32LayerNorm with AdaLayerNorm in the WanTransformer3DModel, WanVACETransformer3DModel, ..., simplifying the forward pass and enhancing model parallelism compatibility.

Context: #11518 (comment)

@yiyixuxu @a-r-r-o-w

Replace the final `FP32LayerNorm` and manual shift/scale application with a single `AdaLayerNorm` module in both the `WanTransformer3DModel` and `WanVACETransformer3DModel`. This change simplifies the forward pass by encapsulating the adaptive normalization logic within the `AdaLayerNorm` layer, removing the need for a separate `scale_shift_table`. The `_no_split_modules` list is also updated to include `norm_out` for compatibility with model parallelism.

…anVACE transformers

Updates the key mapping for the `head.modulation` layer to `norm_out.linear` in the model conversion script. This correction ensures that weights are loaded correctly for both standard and VACE transformer models.

… in Wan and WanVACE transformers

Replaces the manual implementation of adaptive layer normalization, which used a separate `scale_shift_table` and `nn.LayerNorm`, with the unified `AdaLayerNorm` module. This change simplifies the forward pass logic in several transformer models by encapsulating the normalization and modulation steps into a single component. It also adds `norm_out` to `_no_split_modules` for model parallelism compatibility.

Corrects the target key for `head.modulation` to `norm_out.linear.weight`. This ensures the weights are correctly mapped to the weight parameter of the output normalization layer during model conversion for both transformer types.

Adds a default zero-initialized bias tensor for the transformer's output normalization layer if it is missing from the original state dictionary.

tolgacangoz added 7 commits July 2, 2025 18:55

fix: remove scale_shift_table from _keep_in_fp32_modules in Wan and W…

e4b30b8

…anVACE transformers

Fixes transformed head modulation layer mapping

92f8237

Updates the key mapping for the `head.modulation` layer to `norm_out.linear` in the model conversion script. This correction ensures that weights are loaded correctly for both standard and VACE transformer models.

Fix: Revert removing scale_shift_table from _keep_in_fp32_modules…

df07b88

… in Wan and WanVACE transformers

Fix head.modulation mapping in conversion script

921396a

Corrects the target key for `head.modulation` to `norm_out.linear.weight`. This ensures the weights are correctly mapped to the weight parameter of the output normalization layer during model conversion for both transformer types.

Fix handling of missing bias keys in conversion script

ff95d5d

Adds a default zero-initialized bias tensor for the transformer's output normalization layer if it is missing from the original state dictionary.

tolgacangoz changed the title ~~Refactor output normalization in several transformers~~ Propose to refactor output normalization in several transformers Jul 3, 2025

Merge branch 'main' into transfer-shift_scale_norm-to-AdaLayerNorm

3fd6f4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Propose to refactor output normalization in several transformers #11850

Propose to refactor output normalization in several transformers #11850

tolgacangoz commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Propose to refactor output normalization in several transformers #11850

Are you sure you want to change the base?

Propose to refactor output normalization in several transformers #11850

Conversation

tolgacangoz commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tolgacangoz commented Jul 2, 2025 •

edited

Loading